Goto

Collaborating Authors

 target localization


Boosting Weakly Supervised Referring Image Segmentation via Progressive Comprehension

Neural Information Processing Systems

This paper explores the weakly-supervised referring image segmentation (WRIS) problem, and focuses on a challenging setup where target localization is learned directly from image-text pairs. We note that the input text description typically already contains detailed information on how to localize the target object, and we also observe that humans often follow a step-by-step comprehension process (\ie, progressively utilizing target-related attributes and relations as cues) to identify the target object. Hence, we propose a novel Progressive Comprehension Network (PCNet) to leverage target-related textual cues from the input description for progressively localizing the target object.Specifically, we first use a Large Language Model (LLM) to decompose the input text description into short phrases. These short phrases are taken as target-related cues and fed into a Conditional Referring Module (CRM) in multiple stages, to allow updating the referring text embedding and enhance the response map for target localization in a multi-stage manner.Based on the CRM, we then propose a Region-aware Shrinking (RaS) loss to constrain the visual localization to be conducted progressively in a coarse-to-fine manner across different stages.Finally, we introduce an Instance-aware Disambiguation (IaD) loss to suppress instance localization ambiguity by differentiating overlapping response maps generated by different referring texts on the same image. Extensive experiments show that our method outperforms SOTA methods on three common benchmarks.



Joint Cooperative and Non-Cooperative Localization in WSNs with Distributed Scaled Proximal ADMM Algorithms

arXiv.org Artificial Intelligence

Abstract--Cooperative and non-cooperative localization frequently arise together in wireless sensor networks, particularly when sensor positions are uncertain and targets are unable to communicate with the network. While joint processing can eliminate the delay in target estimation found in sequential approaches, it introduces complex variable coupling, posing challenges in both modeling and optimization. This paper presents a joint modeling approach that formulates cooperative and non-cooperative localization as a single optimization problem. T o address the resulting coupling, we introduce auxiliary variables that enable structural decoupling and distributed computation. Building on this formulation, we develop the Scaled Proximal Alternating Direction Method of Multipliers for Joint Cooperative and Non-Cooperative Localization (SP-ADMM-JCNL). Leveraging the problem's structured design, we provide theoretical guarantees that the algorithm generates a sequence converging globally to the Karush-Kuhn-T ucker (KKT) point of the reformulated problem and further to a critical point of the original non-convex objective function, with a sublinear rate of O(1/T). Experiments on both synthetic and benchmark datasets demonstrate that SP-ADMM-JCNL achieves accurate and reliable localization performance. Index T erms--Joint localization; Consensus; Distributed algorithm; Scaled Proximal ADMM.


A Recursive Total Least Squares Solution for Bearing-Only Target Motion Analysis and Circumnavigation

arXiv.org Artificial Intelligence

Bearing-only Target Motion Analysis (TMA) is a promising technique for passive tracking in various applications as a bearing angle is easy to measure. Despite its advantages, bearing-only TMA is challenging due to the nonlinearity of the bearing measurement model and the lack of range information, which impairs observability and estimator convergence. This paper addresses these issues by proposing a Recursive Total Least Squares (RTLS) method for online target localization and tracking using mobile observers. The RTLS approach, inspired by previous results on Total Least Squares (TLS), mitigates biases in position estimation and improves computational efficiency compared to pseudo-linear Kalman filter (PLKF) methods. Additionally, we propose a circumnavigation controller to enhance system observability and estimator convergence by guiding the mobile observer in orbit around the target. Extensive simulations and experiments are performed to demonstrate the effectiveness and robustness of the proposed method. The proposed algorithm is also compared with the state-of-the-art approaches, which confirms its superior performance in terms of both accuracy and stability.


IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation

arXiv.org Artificial Intelligence

Visual navigation with an image as goal is a fundamental and challenging problem. Conventional methods either rely on end-to-end RL learning or modular-based policy with topological graph or BEV map as memory, which cannot fully model the geometric relationship between the explored 3D environment and the goal image. In order to efficiently and accurately localize the goal image in 3D space, we build our navigation system upon the renderable 3D gaussian (3DGS) representation. However, due to the computational intensity of 3DGS optimization and the large search space of 6-DoF camera pose, directly leveraging 3DGS for image localization during agent exploration process is prohibitively inefficient. To this end, we propose IGL-Nav, an Incremental 3D Gaussian Localization framework for efficient and 3D-aware image-goal navigation. Specifically, we incrementally update the scene representation as new images arrive with feed-forward monocular prediction. Then we coarsely localize the goal by leveraging the geometric information for discrete space matching, which can be equivalent to efficient 3D convolution. When the agent is close to the goal, we finally solve the fine target pose with optimization via differentiable rendering. The proposed IGL-Nav outperforms existing state-of-the-art methods by a large margin across diverse experimental configurations. It can also handle the more challenging free-view image-goal setting and be deployed on real-world robotic platform using a cellphone to capture goal image at arbitrary pose. Project page: https://gwxuan.github.io/IGL-Nav/.


Boosting Weakly Supervised Referring Image Segmentation via Progressive Comprehension

Neural Information Processing Systems

This paper explores the weakly-supervised referring image segmentation (WRIS) problem, and focuses on a challenging setup where target localization is learned directly from image-text pairs. We note that the input text description typically already contains detailed information on how to localize the target object, and we also observe that humans often follow a step-by-step comprehension process (\ie, progressively utilizing target-related attributes and relations as cues) to identify the target object. Hence, we propose a novel Progressive Comprehension Network (PCNet) to leverage target-related textual cues from the input description for progressively localizing the target object.Specifically, we first use a Large Language Model (LLM) to decompose the input text description into short phrases. These short phrases are taken as target-related cues and fed into a Conditional Referring Module (CRM) in multiple stages, to allow updating the referring text embedding and enhance the response map for target localization in a multi-stage manner.Based on the CRM, we then propose a Region-aware Shrinking (RaS) loss to constrain the visual localization to be conducted progressively in a coarse-to-fine manner across different stages.Finally, we introduce an Instance-aware Disambiguation (IaD) loss to suppress instance localization ambiguity by differentiating overlapping response maps generated by different referring texts on the same image. Extensive experiments show that our method outperforms SOTA methods on three common benchmarks.


FlowPlan: Zero-Shot Task Planning with LLM Flow Engineering for Robotic Instruction Following

arXiv.org Artificial Intelligence

Robotic instruction following tasks require seamless integration of visual perception, task planning, target localization, and motion execution. However, existing task planning methods for instruction following are either data-driven or underperform in zero-shot scenarios due to difficulties in grounding lengthy instructions into actionable plans under operational constraints. To address this, we propose FlowPlan, a structured multi-stage LLM workflow that elevates zero-shot pipeline and bridges the performance gap between zero-shot and data-driven in-context learning methods. By decomposing the planning process into modular stages--task information retrieval, language-level reasoning, symbolic-level planning, and logical evaluation--FlowPlan generates logically coherent action sequences while adhering to operational constraints and further extracts contextual guidance for precise instance-level target localization. Benchmarked on the ALFRED and validated in real-world applications, our method achieves competitive performance relative to data-driven in-context learning methods and demonstrates adaptability across diverse environments. This work advances zero-shot task planning in robotic systems without reliance on labeled data. Project website: https://instruction-following-project.github.io/.


Adaptive Target Localization under Uncertainty using Multi-Agent Deep Reinforcement Learning with Knowledge Transfer

arXiv.org Artificial Intelligence

Target localization is a critical task in sensitive applications, where multiple sensing agents communicate and collaborate to identify the target location based on sensor readings. Existing approaches investigated the use of Multi-Agent Deep Reinforcement Learning (MADRL) to tackle target localization. Nevertheless, these methods do not consider practical uncertainties, like false alarms when the target does not exist or when it is unreachable due to environmental complexities. To address these drawbacks, this work proposes a novel MADRL-based method for target localization in uncertain environments. The proposed MADRL method employs Proximal Policy Optimization to optimize the decision-making of sensing agents, which is represented in the form of an actor-critic structure using Convolutional Neural Networks. The observations of the agents are designed in an optimized manner to capture essential information in the environment, and a team-based reward functions is proposed to produce cooperative agents. The MADRL method covers three action dimensionalities that control the agents' mobility to search the area for the target, detect its existence, and determine its reachability. Using the concept of Transfer Learning, a Deep Learning model builds on the knowledge from the MADRL model to accurately estimating the target location if it is unreachable, resulting in shared representations between the models for faster learning and lower computational complexity. Collectively, the final combined model is capable of searching for the target, determining its existence and reachability, and estimating its location accurately. The proposed method is tested using a radioactive target localization environment and benchmarked against existing methods, showing its efficacy.


A Scalable Decentralized Reinforcement Learning Framework for UAV Target Localization Using Recurrent PPO

arXiv.org Artificial Intelligence

The rapid advancements in unmanned aerial vehicles (UAVs) have unlocked numerous applications, including environmental monitoring, disaster response, and agricultural surveying. Enhancing the collective behavior of multiple decentralized UAVs can significantly improve these applications through more efficient and coordinated operations. In this study, we explore a Recurrent PPO model for target localization in perceptually degraded environments like places without GNSS/GPS signals. We first developed a single-drone approach for target identification, followed by a decentralized two-drone model. Our approach can utilize two types of sensors on the UAVs, a detection sensor and a target signal sensor. The single-drone model achieved an accuracy of 93%, while the two-drone model achieved an accuracy of 86%, with the latter requiring fewer average steps to locate the target. This demonstrates the potential of our method in UAV swarms, offering efficient and effective localization of radiant targets in complex environmental conditions.


Assisted Physical Interaction: Autonomous Aerial Robots with Neural Network Detection, Navigation, and Safety Layers

arXiv.org Artificial Intelligence

The paper introduces a novel framework for safe and autonomous aerial physical interaction in industrial settings. It comprises two main components: a neural network-based target detection system enhanced with edge computing for reduced onboard computational load, and a control barrier function (CBF)-based controller for safe and precise maneuvering. The target detection system is trained on a dataset under challenging visual conditions and evaluated for accuracy across various unseen data with changing lighting conditions. Depth features are utilized for target pose estimation, with the entire detection framework offloaded into low-latency edge computing. The CBF-based controller enables the UAV to converge safely to the target for precise contact. Simulated evaluations of both the controller and target detection are presented, alongside an analysis of real-world detection performance.